This notebook will include informal meta-analyses of different metrics and methods for evaluating surgical skill.
The reported metrics compare differences between novices and expert surgeons.
It is informal because it’s not based on systematic review, and because some studies have been included with very relaxed conditions. For example, I have picked the novices and experts without comparing their definitions between studies. Novice = weakest skill group in the study, expert = strongest skill group in the study. If a study included more than 2 groups, I picked the weakest (=novice) and strongest (=expert) groups’ results and discarded the others. If a study included more than 1 task, or several sub-tasks, I picked the one with largest difference between groups.
Many papers did report means and standard deviations explicitly, so they had to be estimated from boxplots/barplots, or by some other means
For example, sometimes studies reported only mean or median, but no SE/SD. I estimated the SD/SE in those cases based e.g. on the SD of some other similar metric that they reported, or the SD of previous results for the same metric. See the excel file for notes on each study.
May or may not be turned into more systematic meta-analysis later.
Example metrics that will be most likely included (Bolded ones have priority)
Full list of papers and metrics can be found in the excel file shared in the repo:
Last update: 19.7.2022.: Added more results. Changed Laparoscopy -> Endoscopy, so all endoscopic procedures are labeled ‘endoscopy’
If you notice errors or know some good studies to be included, feel free to forward them to
jani.koskinen [ at ] uef.fi
or use the form below TBD
These values are used as input in the R meta package’s metagen function.
For more information, check:
Forest plot explanation
Some general statistics of the studies included:
Number of unique studies: 88
Number of studies by surgical technique:
| Technique | Count |
|---|---|
| Endoscopy | 44 |
| Microsurgery | 14 |
| Open Surgery | 12 |
| Radiography | 1 |
| Robotic Surgery | 8 |
Number of studies by metric:
| Metric | Count |
|---|---|
| task_time | 35 |
| tool_path_length | 24 |
| tool_velocity | 16 |
| tool_idle | 8 |
| tool_movements | 16 |
| tool_jerk | 14 |
| tool_acceleration | 8 |
| tool_bimanual | 7 |
| pupil_dilation | 7 |
| tool_force | 12 |
| scale_OSATS | 9 |
How many samples needed at some effect size d? At alpha = 0.05 and power = 0.8 and using t-test. Assuming independent trials (e.g. no multiple measurements from same participants etc.)
Hover mouse over the points in the plot to see the values. Sample size is for group, so you need this many samples per group
Some baseline effect sizes from the meta-analyses given as baseline:
IT = Idle Time
TT = Task Time
BD = Bimanual Dexterity
TEPR = Task-Evoked Pupil Reaction/Dilation (Esimated without one outlier study removed)
TJ = Tool Jerk
TF = Tool Force
Task time is the time taken to complete a task. Task can be short like a single knot or some longer complex task.
Load data
df.time <- read_excel('data/surgical_metrics.xlsx', sheet='task_time')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Koskinen et al. | 2022 | Utilizing Grasp Monitoring to Predict Microsurgical Expertise | Journal of Surgical Research | NA |
| Chainey et al. | 2021 | Eye-Hand Coordination of Neurosurgeons: Evidence of Action-Related Fixation in Microsuturing | World Neurosurgery | NA |
| Harada et al. | 2015 | Assessing microneurosurgical skill with medico-engineering technology | World Neurosurgery | effects estimated from boxplot |
| Vedula et al. | 2016 | Task-Level vs. Segment-Level Quantitative Metrics for Surgical Skill Assessment | Journal of Surgical Education | effects estimated from barplot. Sample size per group not given, estimated from total sample (135 trials total, 4 experts, 14 novices, expert sample size rounded from (4/18)*135) |
| Judkins et al. | 2009 | Objective evaluation of expert and novice performance during robotic surgical training tasks | Surgical Endoscopy | effect estimated from barplot. Novices pre-training, three trials each, five novices and five experts |
| Smith et al. | 2002 | Motion analysis: A tool for assessing laparoscopic dexterity in the performance of a laboratory-based laparoscopic cholecystectomy | Surgical Endoscopy and Other Interventional Techniques | Worst and best groups compared, novices have performed < tasks, experts >100 |
| Francis et al. | 2002 | The performance of master surgeons on the Advanced Dundee Endoscopic Psychomotor Tester: Contrast validity study | Archives of Surgery | effects estimated from boxplots |
| Moorthy et al. | 2004 | Bimodal assessment of laparoscopic suturing skills: Construct and concurrent validity | Surgical Endoscopy and Other Interventional Techniques | NA |
| Van Sickle et al. | 2008 | Construct validity of an objective assessment method for laparoscopic intracorporeal suturing and knot tying | The American Journal of Surgery | the expert group had only 2 trials, and outperformed the other groups vastly (task time 15.6 sec!). Thus I compared instead the trained residents (second most experiened group) |
| Xeroulis et al. | 2009 | Simulation in laparoscopic surgery: A concurrent validity study for FLS | Surgical Endoscopy and Other Interventional Techniques | effect sizes estimated from barplot |
| Huffman et al. | 2020 | Optimizing Assessment of Surgical Knot Tying Skill | Journal of Surgical Education | By hand, did not use instruments |
| Law et al. | 2004 | Eye gaze patterns differentiate novice and experts in a virtual laparoscopic surgery training environment | Proceedings of the Eye tracking research & applications symposium on Eye tracking research & applications - ETRA’2004 | NA |
| Kazemi et al. | 2010 | Assessing suturing techniques using a virtual reality surgical simulator | Microsurgery | task completed in VR. Medical students and medical surgeons compared. Times estimated from barplot |
| O’Toole et al. | 1999 | Measuring and Developing Suturing Technique with a Virtual Reality Surgical Simulator | Journal ofthe American College of Surgeons | Virtual reality, times from the trial taken after training |
| Zheng et al. | 2021 | Action-related eye measures to assess surgical expertise | BJS Open | Transporting and loading task |
| Datta et al. | 2001 | The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model | Journal of the American College of Surgeons | Used ICSAD system to record data. Several skill groups, here we compare basic surgical trainees and consultants |
| Pagador et al. | 2012 | Decomposition and analysis of laparoscopic suturing task using tool-motion analysis (TMA): Improving the objective assessment | International Journal of Computer Assisted Radiology and Surgery | First subtask results |
| Aggarwal et al. | 2007 | An evaluation of the feasibility, validity, and reliability of laparoscopic skills assessment in the operating room | Annals of Surgery | Whole procedure, paper reports medians and inter-quartile ranges, the SDs are calculated from these (IQR*(3/4)) |
| Wilson et al. | 2010 | Psychomotor control in a virtual laparoscopic surgery training environment: Gaze control parameters differentiate novices from experts | Surgical Endoscopy | NA |
| Hofstad et al. | 2013 | A study of psychomotor skills in minimally invasive surgery: What differentiates expert and nonexpert performance | Surgical Endoscopy and Other Interventional Techniques | Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results |
| Hung et al. | 2018 | Development and Validation of Objective Performance Metrics for Robot-Assisted Radical Prostatectomy: A Pilot Study | Journal of Urology | Values given as mean and 95% conf interval. SD calculated from conf interval by sqrt(N)*(upper lim - lower lim)/3.92 |
| Yamaguchi et al. | 2011 | Objective assessment of laparoscopic suturing skills using a motion-tracking system | Surgical Endoscopy | Results for the whole procedure |
| Pellen et al. | 2009 | Laparoscopic surgical skills assessment: Can simulators replace experts? | World Journal of Surgery | Estimated effects and SDs from boxplots. |
| Pastewski et al. | 2021 | Analysis of Instrument Motion and the Impact of Residency Level and Concurrent Distraction on Laparoscopic Skills | Journal of Surgical Education | Used results for without secondary task |
| Chmarra et al. | 2010 | Objective classification of residents based on their psychomotor laparoscopic skills | Surgical Endoscopy and Other Interventional Techniques | Values estimated from plots, used the pipe cleaner task results. |
| Rittenhouse et al. | 2014 | Design and validation of an assessment tool for open surgical procedures | Surgical Endoscopy | Used Wii (IR sensor) and Patrio EM tracking. Results are for the Patriot tracking system. Values estimated from barplot (Fig. 6) |
| Mackenzie et al. | 2021 | Enhanced Training Benefits of Video Recording Surgery With Automated Hand Motion Analysis | World Journal of Surgery | Values given as means and ranges. Compared experts and residents post-training. SD for idle time not given, estimated from variance of total active time. |
| Mazomenos et al. | 2016 | Catheter manipulation analysis for objective performance and technical skills assessment in transcatheter aortic valve implantation | International Journal of Computer Assisted Radiology and Surgery | Task was performed with conventional tools and with robotic tools. Results are for conventional tools. There were 2 stages, results here are for stage 1. SDs evaluated from boxplots (Fig. 5). Expert jerk weirdly small? |
| Amiel et al. | 2020 | Experienced surgeons versus novice surgery residents: Validating a novel knot tying simulator for vessel ligation | Surgery | 4 different knot types, each completed twice. 15 experts and 30 novices. Results are for the deep two hand knot (Fig. 2). Effects estimated from the plot, for Total Force. |
| Balasundaram et al. | 2022 | Acquisition of microvascular suturing techniques is feasible using objective measures of performance outside of the operating room | British Journal of Oral and Maxillofacial Surgery | Results for novices are for post-intervention (training), fig 5. Effects estimated from the figure. |
| Franco-González et al. | 2021 | Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery | Journal of Medical Systems | Values are for the suturing task |
| Berges et al. | 2022 | Eye Tracking and Motion Data Predict Endoscopic Sinus Surgery Skill | Laryngoscope | Participants completed 9 tasks. Results are for total time |
| Saleh et al. | 2006 | Evaluating surgical dexterity during corneal suturing | Archives of Ophthalmology | Values given as medians and inter-quartile ranges. Values are for novice and expet surgeons (Table) |
| Balal et al. | 2019 | Computer analysis of individual cataract surgery segments in the operating room | Eye (Basingstoke) | Results from Table 1 for CCC |
| Pérez-Escamirosa | 2020 | Design of a Dynamic Force Measurement System for Training and Evaluation of Suture Surgical Skills | Journal of Medical Systems | Task time values from Table 2 |
m.time <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.time,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="Time to completion in Surgery")
summary(m.time)
## Review: Time to completion in Surgery
##
## SMD 95%-CI %W(random)
## Koskinen et al. 1.8413 [ 1.4135; 2.2691] 3.3
## Chainey et al. 0.7034 [ 0.0383; 1.3686] 3.2
## Harada et al. 1.5503 [ 0.8551; 2.2456] 3.2
## Vedula et al. 2.2149 [ 1.7299; 2.6999] 3.3
## Judkins et al. 5.3971 [ 3.8216; 6.9726] 2.6
## Smith et al. 8.0559 [ 5.5551; 10.5568] 1.9
## Francis et al. 0.9801 [ 0.3227; 1.6375] 3.2
## Moorthy et al. 1.4157 [ 0.3397; 2.4917] 3.0
## Van Sickle et al. 2.1365 [ 1.0202; 3.2528] 2.9
## Xeroulis et al. 2.5525 [ 1.2650; 3.8400] 2.8
## Huffman et al. 6.5116 [ 4.9174; 8.1059] 2.6
## Law et al. 2.0257 [ 1.3401; 2.7112] 3.2
## Kazemi et al. 0.8354 [-0.3084; 1.9791] 2.9
## O'Toole et al. 1.7086 [ 0.6569; 2.7602] 3.0
## Zheng et al. 1.9382 [ 0.6894; 3.1871] 2.8
## Datta et al. 2.1791 [ 1.1762; 3.1819] 3.0
## Pagador et al. 6.3695 [ 2.5221; 10.2170] 1.2
## Aggarwal et al. 0.1873 [-0.4390; 0.8136] 3.2
## Wilson et al. 1.3349 [ 0.1520; 2.5179] 2.9
## Hofstad et al. 1.3791 [ 0.3199; 2.4382] 3.0
## Hung et al. 2.2342 [ 1.7203; 2.7481] 3.3
## Yamaguchi et al. 4.5870 [ 2.7625; 6.4116] 2.4
## Pellen et al. 5.6362 [ 3.6128; 7.6596] 2.3
## Pastewski et al. 0.5600 [-0.1171; 1.2371] 3.2
## Chmarra et al. 0.9810 [ 0.0706; 1.8914] 3.1
## Rittenhouse et al. 3.9042 [ 2.1253; 5.6832] 2.5
## Mackenzie et al. 0.5613 [-1.5135; 2.6361] 2.2
## Mazomenos et al. 2.3487 [ 0.8266; 3.8708] 2.6
## Amiel et al. 1.8403 [ 1.3249; 2.3556] 3.3
## Balasundaram et al. 0.8384 [-0.0792; 1.7559] 3.1
## Franco-González et al. 1.7391 [ 0.4474; 3.0309] 2.8
## Berges et al. 1.0838 [ 0.7576; 1.4101] 3.3
## Saleh et al. 4.7016 [ 2.9459; 6.4574] 2.5
## Balal et al. 1.6436 [ 0.9231; 2.3642] 3.2
## Pérez-Escamirosa 1.6479 [ 0.4946; 2.8012] 2.9
##
## Number of studies combined: k = 35
##
## SMD 95%-CI t p-value
## Random effects model 2.2236 [1.6224; 2.8248] 7.52 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 2.2022 [1.5060; 5.3980]; tau = 1.4840 [1.2272; 2.3234]
## I^2 = 83.9% [78.5%; 88.0%]; H = 2.49 [2.16; 2.88]
##
## Test of heterogeneity:
## Q d.f. p-value
## 211.46 34 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.time,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Time to completion in Surgery")
#dev.print(pdf, "figures/forest_time.pdf", width=10, height=10)
With enough results, we can do regression analysis to compare e.g. how the effect sizes differed between surgical techniques.
First, plot by surgical technique (red labels show the number of studies):
n_obs <- function(x){
return(c(y=0, label=length(x)))
}
ggplot(df.time, aes(x=Technique, y=SMD)) + geom_boxplot() + stat_summary(fun.data = n_obs, colour = "red", size = 5, geom = "text")
Fit linear model with Technique as explanatory variable. Microsurgery effect size is used as baseline (intercept).
df.time$Technique <- as.factor(df.time$Technique)
df.time <- within(df.time, Technique <- relevel(Technique, ref="Microsurgery"))
lm.time <- lm(SMD ~ Technique, data=df.time)
summary(lm.time)
##
## Call:
## lm(formula = SMD ~ Technique, data = df.time)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5495 -1.1863 -0.6979 0.0592 5.6037
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.7962 0.7400 2.427 0.0212 *
## TechniqueEndoscopy 0.9438 0.8822 1.070 0.2929
## TechniqueOpen Surgery 0.8982 1.1933 0.753 0.4573
## TechniqueRobotic Surgery 1.5461 1.4170 1.091 0.2837
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.093 on 31 degrees of freedom
## Multiple R-squared: 0.05038, Adjusted R-squared: -0.04152
## F-statistic: 0.5482 on 3 and 31 DF, p-value: 0.6531
Time to completion is by far the most often reported metric. It is often reported even when it is not the main focus of the study.
Bimanual dexterity is a measure of how well the surgeon is able to use both hands at the same time. Note that there are many different ways for calculating “ability to use both hands simultaneously.”
Load data
df.biman <- read_excel('data/surgical_metrics.xlsx', sheet='tool_bimanual')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Koskinen et al. | 2022 | Movement-level process modeling of microsurgical bimanual and unimanual tasks | International Journal of Computer Assisted Radiology and Surgery | Bimanual efficiency defined as using both hand simultaneously for something productive |
| Hofstad et al. | 2017 | Psychomotor skills assessment by motion analysis in minimally invasive surgery on an animal organ | Minimally Invasive Therapy and Allied Technologies | Bimanual dexterity defined as the correlation between the two hands tool movements. Values estimated from boxplots |
| Demirel et al. | 2022 | Scoring metrics for assessing skills in arthroscopic rotator cuff repair: performance comparison study of novice and expert surgeons | International Journal of Computer Assisted Radiology and Surgery | Standard deviations estimated from the standard deviations of other metrics, not given directly in the paper |
| Islam et al. | 2016 | Affordable, web-based surgical skill training and evaluation tool | Journal of Biomedical Informatics | Mean values estimated from boxplot. Standard deviations were not given, I used the similar-ish values as in our study (i = 0), so novice’s SD is about 1/5 of the mean, experts is 1/12 |
| Zulbaran-Rojas et al. | 2021 | Utilization of Flexible-Wearable Sensors to Describe the Kinematics of Surgical Proficiency | Journal of Surgical Research | I took the ratio of number of dominant and non-dominant hand movements as measure of bimanual dexterity. Other options were velocity and path length. No. Movements felt closest to our definition. |
| Mori et al. | 2022 | Validation of a novel virtual reality simulation system with the focus on training for surgical dissection during laparoscopic sigmoid colectomy | BMC Surgery | Bimanual dexterity measured in GOALS score (see paper for more information). Results given as medians and inter-quartile ranges. SD calculated from IQR as SD = IQR*(3/4) |
| Franco-González et al. | 2021 | Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery | Journal of Medical Systems | Values are for the suturing task |
Run meta-analysis
m.biman <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.biman,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="Bimanual dexterity in Surgery")
Print results
summary(m.biman)
## Review: Bimanual dexterity in Surgery
##
## SMD 95%-CI %W(random)
## Koskinen et al. -3.0589 [ -3.8825; -2.2353] 14.9
## Hofstad et al. -3.0127 [ -4.6473; -1.3782] 13.7
## Demirel et al. -2.0314 [ -3.0378; -1.0251] 14.7
## Islam et al. -8.6969 [-10.7900; -6.6039] 12.7
## Zulbaran-Rojas et al. -0.8250 [ -1.7586; 0.1085] 14.8
## Mori et al. -2.6867 [ -3.6936; -1.6799] 14.7
## Franco-González et al. -1.2364 [ -2.4340; -0.0387] 14.4
##
## Number of studies combined: k = 7
##
## SMD 95%-CI t p-value
## Random effects model -2.9716 [-5.3010; -0.6422] -3.12 0.0205
##
## Quantifying heterogeneity:
## tau^2 = 5.4276 [1.9426; 32.4354]; tau = 2.3297 [1.3938; 5.6952]
## I^2 = 88.7% [79.2%; 93.9%]; H = 2.98 [2.19; 4.04]
##
## Test of heterogeneity:
## Q d.f. p-value
## 53.15 6 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.biman,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Bimanual dexterity in Surgery")
#dev.print(pdf, "figures/forest_biman.pdf", width=8, height=8)
Analysis of bimanual dexterity is made harder because there are so many different definitions for it.
Number of tool movements made during the task. Note: I have included here the grasp results from our paper (and other studies that analyzed only one type of action/movement)
Load data
df.toolmvt <- read_excel('data/surgical_metrics.xlsx', sheet='tool_movements')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Datta et al. | 2001 | The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model | Journal of the American College of Surgeons | Used ICSAD system to record data. Several skill groups, here we compare basic surgical trainees and consultants |
| Pagador et al. | 2012 | Decomposition and analysis of laparoscopic suturing task using tool-motion analysis (TMA): Improving the objective assessment | International Journal of Computer Assisted Radiology and Surgery | Study reported left and right hand movements separately, I picked left hand |
| Koskinen et al. | 2022 | Utilizing Grasp Monitoring to Predict Microsurgical Expertise | Journal of Surgical Research | grasps |
| Bann et al. | 2003 | Measurement of surgical dexterity using motion analysis of simple bench tasks | World Journal of Surgery | Used ICSAD system to record data. Reports medians and inter-quartile ranges. |
| Smith et al. | 2002 | Motion analysis: A tool for assessing laparoscopic dexterity in the performance of a laboratory-based laparoscopic cholecystectomy | Surgical Endoscopy and Other Interventional Techniques | Multiple tasks, picked Calot’s triangle. Surgeon groups A and C compared |
| Aggarwal et al. | 2007 | An evaluation of the feasibility, validity, and reliability of laparoscopic skills assessment in the operating room | Annals of Surgery | Whole procedure, paper reports medians and inter-quartile ranges, the SDs are calculated from these (IQR*(3/4)) |
| Yamaguchi et al. | 2007 | Construct validity for eye-hand coordination skill on a virtual reality laparoscopic surgical simulator | Surgical Endoscopy and Other Interventional Techniques | Effects and SDs estimated from barplots. Reported right hand movements |
| Goldbraikh et al. | 2021 | Video-based fully automatic assessment of Open Surgery suturing skills | International Journal of Computer Assisted Radiology and Surgery | Task:Balloon dominant hand |
| Vedula et al. | 2016 | Task-Level vs . Segment-Level Quantitative Metrics for Surgical Skill Assessment | Journal of Surgical Education | Effects and SDs estimated from barplots. Paper does not give Ne/Nn directly, total of 135 trials performed by 14 novices and 4 experts, so I estimated sample sizes by 135(14/(14+4)) for novices and 135(4/(14+4)) for experts |
| Wilson et al. | 2010 | Psychomotor control in a virtual laparoscopic surgery training environment: Gaze control parameters differentiate novices from experts | Surgical Endoscopy | reported left and right hand separately, I used left hand because usually differences are larger with non-dominant hand (all were right-handed) |
| Hofstad et al. | 2013 | A study of psychomotor skills in minimally invasive surgery: What differentiates expert and nonexpert performance | Surgical Endoscopy and Other Interventional Techniques | Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results |
| Rittenhouse et al. | 2014 | Design and validation of an assessment tool for open surgical procedures | Surgical Endoscopy | Used Wii (IR sensor) and Patrio EM tracking. Results are for the Patriot tracking system. Values estimated from barplot (Fig. 6) |
| Balasundaram et al. | 2022 | Acquisition of microvascular suturing techniques is feasible using objective measures of performance outside of the operating room | British Journal of Oral and Maxillofacial Surgery | Results for novices are for post-intervention (training), fig 5. Effects estimated from the figure. |
| Franco-González et al. | 2021 | Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery | Journal of Medical Systems | Values are for the suturing task |
| Saleh et al. | 2006 | Evaluating surgical dexterity during corneal suturing | Archives of Ophthalmology | Values given as medians and inter-quartile ranges. Values are for novice and expet surgeons (Table) |
| Balal et al. | 2019 | Computer analysis of individual cataract surgery segments in the operating room | Eye (Basingstoke) | Results from Table 1 for CCC |
Run meta-analysis
m.toolmvt <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.toolmvt,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="Tool movements in Surgery")
summary(m.toolmvt)
## Review: Tool movements in Surgery
##
## SMD 95%-CI %W(random)
## Datta et al. 2.0390 [ 1.0607; 3.0174] 6.6
## Pagador et al. 10.0866 [ 4.2364; 15.9368] 1.9
## Koskinen et al. 1.3393 [ 0.7781; 1.9006] 7.0
## Bann et al. 1.2504 [ 0.4629; 2.0380] 6.8
## Smith et al. 5.9403 [ 4.0136; 7.8671] 5.5
## Aggarwal et al. -0.0641 [-0.6894; 0.5612] 6.9
## Yamaguchi et al. 2.3074 [ 1.3887; 3.2260] 6.7
## Goldbraikh et al. 2.6143 [ 1.8535; 3.3751] 6.8
## Vedula et al. 6.5233 [ 5.6418; 7.4047] 6.7
## Wilson et al. 0.5955 [-0.4889; 1.6799] 6.5
## Hofstad et al. 0.9586 [-0.0444; 1.9617] 6.6
## Rittenhouse et al. 4.2589 [ 2.3738; 6.1439] 5.6
## Balasundaram et al. 2.0810 [ 0.9757; 3.1864] 6.5
## Franco-González et al. 1.5547 [ 0.3003; 2.8091] 6.3
## Saleh et al. 1.9550 [ 0.8741; 3.0360] 6.5
## Balal et al. 1.5181 [ 0.8115; 2.2248] 6.9
##
## Number of studies combined: k = 16
##
## SMD 95%-CI t p-value
## Random effects model 2.4092 [1.2764; 3.5421] 4.53 0.0004
##
## Quantifying heterogeneity:
## tau^2 = 3.3007 [1.8235; 13.0443]; tau = 1.8168 [1.3504; 3.6117]
## I^2 = 92.3% [89.1%; 94.6%]; H = 3.61 [3.03; 4.30]
##
## Test of heterogeneity:
## Q d.f. p-value
## 195.00 15 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.toolmvt,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Tool movements in Surgery")
#dev.print(pdf, "figures/forest_toolmvt.pdf", width=8, height=8)
Tool movements are perhaps the second most often reported metric. Different papers measure, analyze and report them differently. Often connected to “movement efficiency”.
Tool idle time measures how long the tools were not being used, either as time or as fraction of the complete task time.
Load data
df.toolidle <- read_excel('data/surgical_metrics.xlsx', sheet='tool_idle')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Koskinen et al. | 2021 | Movement-level process modeling of microsurgical bimanual and unimanual tasks | International Journal of Computer Assisted Radiology and Surgery | Reports left/right hand separately, results are for left-hand. Paper reports suturing efficiency, which is the inverse of idle time (idle time = 1 - efficiency) |
| Uemura et al. | 2015 | Procedural surgical skill assessment in laparoscopic training environments | International Journal of Computer Assisted Radiology and Surgery | Reports left/right hand separately, results are for left-hand. Max time given as 420s, every novice exceeded this. |
| D’Angelo et al. | 2015 | Idle time: An underdeveloped performance metric for assessing surgical skill | American Journal of Surgery | Does not report idle time directly per skill group, only number of idle periods. Took values from the first segment, entering tissue with needle. Did not report SD for idle periods, estimated it from the SD of total operative time: SD_idle = M_idle*(SD_time/M_time). |
| Mackenzie et al. | 2021 | Enhanced Training Benefits of Video Recording Surgery With Automated Hand Motion Analysis | World Journal of Surgery | Values given as means and ranges. Compared experts and residents post-training. SD for idle time not given, estimated from variance of total active time. |
| Oropesa et al. | 2013 | Relevance of Motion-Related Assessment Metrics in Laparoscopic Surgery | Surgical Innovation | Means and SDs estimated from boxplots. Reports dominant and non-dominant hand separately, I picked non-dominant hand. Results for Coordinated pulling task. |
| Hung et al. | 2018 | Development and Validation of Objective Performance Metrics for Robot-Assisted Radical Prostatectomy: A Pilot Study | Journal of Urology | Values given as mean and 95% conf interval. SD calculated from conf interval by sqrt(N)*(upper lim - lower lim)/3.92 |
| Topalli et al. | 2018 | Eye-Hand Coordination Patterns of Intermediate and Novice Surgeons in a Simulation-Based Endoscopic Surgery Training Environment | Journal of Eye Movement Research | Reports “Stand still duration”, which measures the time when tools were still. Corresponds roughly to idle time. Compares novices and intermediates |
| Pérez-Escamirosa | 2020 | Design of a Dynamic Force Measurement System for Training and Evaluation of Suture Surgical Skills | Journal of Medical Systems | Idle time values from Table 2, given as percentage. Note, idle time defined as time when “no reaction between instruments and the tissue was measured.” |
Run meta-analysis
m.toolidle <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.toolidle,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="Idle time in Surgery")
summary(m.toolidle)
## Review: Idle time in Surgery
##
## SMD 95%-CI %W(random)
## Koskinen et al. 2.5600 [ 1.8069; 3.3131] 14.2
## Uemura et al. 1.6933 [ 0.7816; 2.6050] 13.6
## D'Angelo et al. 2.7724 [ 1.0363; 4.5085] 10.0
## Mackenzie et al. 0.3642 [-1.6449; 2.3734] 8.8
## Oropesa et al. 0.9556 [-0.1828; 2.0941] 12.6
## Hung et al. 0.6990 [ 0.2837; 1.1144] 15.3
## Topalli et al. 0.6159 [-0.4828; 1.7147] 12.8
## Pérez-Escamirosa -1.3224 [-2.4219; -0.2230] 12.8
##
## Number of studies combined: k = 8
##
## SMD 95%-CI t p-value
## Random effects model 1.0387 [-0.0486; 2.1259] 2.26 0.0584
##
## Quantifying heterogeneity:
## tau^2 = 1.3432 [0.4034; 6.7364]; tau = 1.1590 [0.6352; 2.5955]
## I^2 = 83.6% [69.3%; 91.3%]; H = 2.47 [1.80; 3.38]
##
## Test of heterogeneity:
## Q d.f. p-value
## 42.73 7 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.toolidle,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Idle time in Surgery")
#dev.print(pdf, "figures/forest_toolidle.pdf", width=8, height=8)
Not many papers that focused on idle time.
How much the tools travel during the task.
Load data
df.toolpl <- read_excel('data/surgical_metrics.xlsx', sheet='tool_path_length')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Aggarwal et al. | 2007 | An evaluation of the feasibility, validity, and reliability of laparoscopic skills assessment in the operating room | Annals of Surgery | Whole procedure, paper reports medians and inter-quartile ranges, the SDs are calculated from these (IQR*(3/4)) |
| Moorthy et al. | 2004 | Bimodal assessment of laparoscopic suturing skills: Construct and concurrent validity | Surgical Endoscopy and Other Interventional Techniques | box trainer |
| Smith et al. | 2002 | Motion analysis: A tool for assessing laparoscopic dexterity in the performance of a laboratory-based laparoscopic cholecystectomy | Surgical Endoscopy and Other Interventional Techniques | Multiple tasks, picked Calot’s triangle. Surgeon groups A and C compared |
| Pagador et al. | 2012 | Decomposition and analysis of laparoscopic suturing task using tool-motion analysis (TMA): Improving the objective assessment | International Journal of Computer Assisted Radiology and Surgery | Study reported left and right hand movements separately, I picked left hand |
| Goldbraikh et al. | 2021 | Video-based fully automatic assessment of Open Surgery suturing skills | International Journal of Computer Assisted Radiology and Surgery | Task:Balloon dominant hand |
| Jimbo et al. | 2017 | A new innovative laparoscopic fundoplication training simulator with a surgical skill validation system | Surgical Endoscopy | Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results |
| Hofstad et al. | 2013 | A study of psychomotor skills in minimally invasive surgery: What differentiates expert and nonexpert performance | Surgical Endoscopy and Other Interventional Techniques | Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results |
| Oropesa et al. | 2013 | Relevance of Motion-Related Assessment Metrics in Laparoscopic Surgery | Surgical Innovation | Means and SDs estimated from boxplots. Reports dominant and non-dominant hand separately, I picked non-dominant hand. Results for Coordinated pulling task. |
| Pellen et al. | 2009 | Laparoscopic surgical skills assessment: Can simulators replace experts? | World Journal of Surgery | Values estimated from boxplots |
| D’Angelo et al. | 2015 | Idle time: An underdeveloped performance metric for assessing surgical skill | American Journal of Surgery | NA |
| Hung et al. | 2018 | Development and Validation of Objective Performance Metrics for Robot-Assisted Radical Prostatectomy: A Pilot Study | Journal of Urology | Values given as mean and 95% conf interval. SD calculated from conf interval by sqrt(N)*(upper lim - lower lim)/3.92. Results for non-dominant hand |
| Vedula et al. | 2016 | Task-Level vs . Segment-Level Quantitative Metrics for Surgical Skill Assessment | Journal of Surgical Education | Effects and SDs estimated from barplots. Paper does not give Ne/Nn directly, total of 135 trials performed by 14 novices and 4 experts, so I estimated sample sizes by 135(14/(14+4)) for novices and 135(4/(14+4)) for experts |
| Yamaguchi et al. | 2011 | Objective assessment of laparoscopic suturing skills using a motion-tracking system | Surgical Endoscopy | Used results for left hand, for the whole procedure |
| Harada et al. | 2015 | Assessing Microneurosurgical Skill with Medico-Engineering Technology | World Neurosurgery | Results for left hand, estimated from boxplot |
| Ebina et al. | 2021 | Motion analysis for better understanding of psychomotor skills in laparoscopy: objective assessment-based simulation training using animal organs | Surgical Endoscopy | Results for needle holder (left hand), from task 3, knot tying and suturing. Results given in paper as medians and inter-quartile ranges |
| Chmarra et al. | 2010 | Objective classification of residents based on their psychomotor laparoscopic skills | Surgical Endoscopy and Other Interventional Techniques | Values estimated from plots, used the pipe cleaner task results. |
| Rittenhouse et al. | 2014 | Design and validation of an assessment tool for open surgical procedures | Surgical Endoscopy | Used Wii (IR sensor) and Patrio EM tracking. Results are for the Patriot tracking system. Values estimated from barplot (Fig. 6) |
| Balasundaram et al. | 2022 | Acquisition of microvascular suturing techniques is feasible using objective measures of performance outside of the operating room | British Journal of Oral and Maxillofacial Surgery | Results for novices are for post-intervention (training), fig 5. Effects estimated from the figure. |
| Glarner et al. | 2014 | Quantifying technical skills during open operations using video-based motion analysis | Surgery | Effects and SDs estimated from fig 3. Effects are for non-dominant hand (ND). The task was split into four sub-tasks, resultsh ere are for suturing, C. Six patients operated on, novice and expert performed the same operation in parallel |
| Zhenzhu et al. | 2020 | Feasibility Study of the Low-Cost Motion Tracking System for Assessing Endoscope Holding Skills | World Neurosurgery | Values estimated from boxplots (Fig. 6), for the 0’ setup. |
| Franco-González et al. | 2021 | Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery | Journal of Medical Systems | Values are for the suturing task |
| Berges et al. | 2022 | Eye Tracking and Motion Data Predict Endoscopic Sinus Surgery Skill | Laryngoscope | Participants completed 9 tasks. Results are for total distance in tracker units |
| Saleh et al. | 2006 | Evaluating surgical dexterity during corneal suturing | Archives of Ophthalmology | Values given as medians and inter-quartile ranges. Values are for novice and expet surgeons (Table) |
| Balal et al. | 2019 | Computer analysis of individual cataract surgery segments in the operating room | Eye (Basingstoke) | Results from Table 1 for CCC |
Run meta-analysis
m.toolpl <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.toolpl,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="Tool path length in Surgery")
summary(m.toolpl)
## Review: Tool path length in Surgery
##
## SMD 95%-CI %W(random)
## Aggarwal et al. 0.0647 [-0.5606; 0.6900] 5.2
## Moorthy et al. 1.3161 [ 0.2542; 2.3780] 4.1
## Smith et al. 2.6541 [ 1.5194; 3.7889] 3.9
## Pagador et al. 6.2865 [ 2.4827; 10.0904] 0.9
## Goldbraikh et al. 2.0174 [ 1.3325; 2.7024] 5.0
## Jimbo et al. 0.8695 [ 0.1950; 1.5441] 5.1
## Hofstad et al. 0.7989 [-0.1876; 1.7853] 4.3
## Oropesa et al. 0.2889 [-0.8108; 1.3885] 4.0
## Pellen et al. 2.0960 [ 0.9877; 3.2042] 4.0
## D'Angelo et al. 1.7506 [ 0.3193; 3.1819] 3.3
## Hung et al. 1.8000 [ 1.3220; 2.2779] 5.5
## Vedula et al. 2.4204 [ 1.9215; 2.9194] 5.5
## Yamaguchi et al. 3.3661 [ 1.8874; 4.8449] 3.2
## Harada et al. 1.0214 [ 0.3743; 1.6685] 5.1
## Ebina et al. 0.9071 [ 0.1861; 1.6281] 5.0
## Chmarra et al. 1.2076 [ 0.2706; 2.1447] 4.4
## Rittenhouse et al. 2.9945 [ 1.4708; 4.5183] 3.1
## Balasundaram et al. 1.2770 [ 0.3080; 2.2460] 4.3
## Glarner et al. -0.7531 [-1.9308; 0.4246] 3.8
## Zhenzhu et al. 4.1986 [ 1.6136; 6.7836] 1.6
## Franco-González et al. 1.6536 [ 0.3795; 2.9276] 3.6
## Berges et al. 0.6704 [ 0.3563; 0.9845] 5.8
## Saleh et al. 1.7083 [ 0.6720; 2.7445] 4.2
## Balal et al. 1.2823 [ 0.5994; 1.9652] 5.1
##
## Number of studies combined: k = 24
##
## SMD 95%-CI t p-value
## Random effects model 1.4605 [1.0056; 1.9153] 6.64 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 0.6387 [0.3758; 2.6781]; tau = 0.7992 [0.6130; 1.6365]
## I^2 = 79.0% [69.4%; 85.6%]; H = 2.18 [1.81; 2.64]
##
## Test of heterogeneity:
## Q d.f. p-value
## 109.74 23 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.toolpl,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Tool path length in Surgery")
#dev.print(pdf, "figures/forest_toolpl.pdf", width=8, height=8)
Tool path length also a very common metric. Most studies report that novices have much larger path length, indicating less effective movements. Results differ based on task and surgical
Tool velocity/speed measures how fast the surgical tool or tools are moving.
Load data
df.toolvelocity <- read_excel('data/surgical_metrics.xlsx', sheet='tool_velocity')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Davids et al. | 2021 | Automated vision-based microsurgical skill analysis in neurosurgery using deep learning: Development and preclinical validation. | World Neurosurgery | Values given as medians |
| Pastewski et al. | 2021 | Analysis of Instrument Motion and the Impact of Residency Level and Concurrent Distraction on Laparoscopic Skills | Journal of Surgical Education | Junior and Senior residents. Did task with and without secondary task (to add distractions). Velocity was reported for three degrees of freedom of motion (yaw, pitch, roll). Results here are for Roll and NO secondary task. |
| Hwang et al. | 2006 | Correlating motor performance with surgical error in laparoscopic cholecystectomy | Surgical Endoscopy and Other Interventional Techniques | NA |
| Ebina et al. | 2021 | Motion analysis for better understanding of psychomotor skills in laparoscopy: objective assessment-based simulation training using animal organs | Surgical Endoscopy | Results for needle holder (left hand), from task 3, knot tying and suturing. Results given in paper as medians and inter-quartile ranges |
| Jimbo et al. | 2017 | A new innovative laparoscopic fundoplication training simulator with a surgical skill validation system | Surgical Endoscopy | Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results |
| Judkins et al. | 2009 | Objective evaluation of expert and novice performance during robotic surgical training tasks | Surgical Endoscopy | Estimated effects and SDs from barplots. Compared experts and novices post-training. Results are for bimanual carrying task, which was repeated 3 times by each participant (5 novices 5 experts) |
| Hofstad et al. | 2013 | A study of psychomotor skills in minimally invasive surgery: What differentiates expert and nonexpert performance | Surgical Endoscopy and Other Interventional Techniques | Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results |
| Frasier et al. | 2016 | A marker-less technique for measuring kinematics in the operating room | Surgery (United States) | Gives values for grand average and by different tasks. I used grand average results. |
| Azari et al. | 2018 | Can surgical performance for varying experience be measured from hand motions? | Proceedings of the Human Factors and Ergonomics Society | Did not report SDs for motion metrics. I estimated SD from the subjective grading fluidity of motion SD, so acceleration SD = acceleration Mean * (grade SD/grade Mean). |
| Pagador et al. | 2012 | Decomposition and analysis of laparoscopic suturing task using tool-motion analysis (TMA): Improving the objective assessment | International Journal of Computer Assisted Radiology and Surgery | Study reported left and right hand movements separately, I picked left hand. First subtask |
| Hung et al. | 2018 | Development and Validation of Objective Performance Metrics for Robot-Assisted Radical Prostatectomy: A Pilot Study | Journal of Urology | Values given as mean and 95% conf interval. SD calculated from conf interval by sqrt(N)*(upper lim - lower lim)/3.92. Results for non-dominant hand |
| Mazomenos et al. | 2016 | Catheter manipulation analysis for objective performance and technical skills assessment in transcatheter aortic valve implantation | International Journal of Computer Assisted Radiology and Surgery | Task was performed with conventional tools and with robotic tools. Results are for conventional tools. There were 2 stages, results here are for stage 1. SDs evaluated from boxplots (Fig. 5). Expert jerk weirdly small? |
| Glarner et al. | 2014 | Quantifying technical skills during open operations using video-based motion analysis | Surgery | Effects and SDs estimated from fig 3. Effects are for non-dominant hand (ND). The task was split into four sub-tasks, resultsh ere are for suturing, C. Six patients operated on, novice and expert performed the same operation in parallel |
| Zhenzhu et al. | 2020 | Feasibility Study of the Low-Cost Motion Tracking System for Assessing Endoscope Holding Skills | World Neurosurgery | Values estimated from boxplots (Fig. 6), for the 0’ setup. |
| Franco-González et al. | 2021 | Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery | Journal of Medical Systems | Values are for the suturing task |
| Berges et al. | 2022 | Eye Tracking and Motion Data Predict Endoscopic Sinus Surgery Skill | Laryngoscope | Participants completed 9 tasks. Results are for average velocity in tracker units |
Run meta-analysis
m.toolvelocity <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.toolvelocity,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="Tool velocity in Surgery")
summary(m.toolvelocity)
## Review: Tool velocity in Surgery
##
## SMD 95%-CI %W(random)
## Davids et al. 0.5140 [-1.5370; 2.5650] 3.9
## Pastewski et al. -0.7177 [-1.4028; -0.0326] 7.5
## Hwang et al. 6.1176 [ 1.5045; 10.7307] 1.2
## Ebina et al. -0.8684 [-1.5865; -0.1503] 7.4
## Jimbo et al. -0.7654 [-1.4334; -0.0974] 7.6
## Judkins et al. 0.6675 [-0.0690; 1.4039] 7.4
## Hofstad et al. 1.0086 [-0.0002; 2.0174] 6.6
## Frasier et al. -1.1447 [-1.7143; -0.5751] 7.8
## Azari et al. -0.2982 [-1.2042; 0.6078] 6.9
## Pagador et al. 0.0585 [-1.3278; 1.4448] 5.5
## Hung et al. -1.6706 [-2.1389; -1.2022] 8.0
## Mazomenos et al. -1.7056 [-3.0573; -0.3540] 5.6
## Glarner et al. -0.6245 [-1.7880; 0.5391] 6.2
## Zhenzhu et al. 2.9204 [ 0.8651; 4.9757] 3.9
## Franco-González et al. 1.0406 [-0.1276; 2.2088] 6.1
## Berges et al. -0.7976 [-1.1148; -0.4803] 8.3
##
## Number of studies combined: k = 16
##
## SMD 95%-CI t p-value
## Random effects model -0.2317 [-0.9313; 0.4680] -0.71 0.4912
##
## Quantifying heterogeneity:
## tau^2 = 0.9091 [0.5199; 6.3097]; tau = 0.9535 [0.7210; 2.5119]
## I^2 = 80.8% [69.7%; 87.8%]; H = 2.28 [1.82; 2.86]
##
## Test of heterogeneity:
## Q d.f. p-value
## 77.95 15 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.toolvelocity,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Tool velocity in Surgery")
#dev.print(pdf, "figures/forest_toolvelocity.pdf", width=8, height=8)
Velocity (and related metrics like acceleration) are semi-popular method. Results seem to vary a lot, sometimes novices are faster and sometimes experts are faster. May depend on task?
Tool acceleration measures how much the tool/tools accelerate during the task.
Load data
df.toolacc <- read_excel('data/surgical_metrics.xlsx', sheet='tool_acceleration')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Azari et al. | 2018 | Can surgical performance for varying experience be measured from hand motions? | Proceedings of the Human Factors and Ergonomics Society | Did not report SDs for motion metrics. I estimated SD from the subjective grading fluidity of motion SD, so acceleration SD = acceleration Mean * (grade SD/grade Mean). |
| Frasier et al. | 2016 | A marker-less technique for measuring kinematics in the operating room | Surgery (United States) | Gives values for grand average and by different tasks. I used grand average results. |
| Ebina et al. | 2021 | Motion analysis for better understanding of psychomotor skills in laparoscopy: objective assessment-based simulation training using animal organs | Surgical Endoscopy | Results for needle holder (left hand), from task 3, knot tying and suturing. Results given in paper as medians and inter-quartile ranges |
| Pastewski et al. | 2021 | Analysis of Instrument Motion and the Impact of Residency Level and Concurrent Distraction on Laparoscopic Skills | Journal of Surgical Education | Junior and Senior residents. Did task with and without secondary task (to add distractions). Acceleration was reported for three degrees of freedom of motion (yaw, pitch, roll). Results here are for Roll and NO secondary task. |
| Davids et al. | 2021 | Automated vision-based microsurgical skill analysis in neurosurgery using deep learning: Development and preclinical validation. | World Neurosurgery | Values given as medians. Sd estimated from boxplot |
| Glarner et al. | 2014 | Quantifying technical skills during open operations using video-based motion analysis | Surgery | Effects and SDs estimated from fig 3. Effects are for non-dominant hand (ND). The task was split into four sub-tasks, resultsh ere are for suturing, C. Six patients operated on, novice and expert performed the same operation in parallel |
| Zhenzhu et al. | 2020 | Feasibility Study of the Low-Cost Motion Tracking System for Assessing Endoscope Holding Skills | World Neurosurgery | Values estimated from boxplots (Fig. 6), for the 0’ setup. Max acceleration |
| Franco-González et al. | 2021 | Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery | Journal of Medical Systems | Values are for the suturing task |
Run meta-analysis
m.toolacc <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.toolacc,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="Tool acceleration in Surgery")
summary(m.toolacc)
## Review: Tool acceleration in Surgery
##
## SMD 95%-CI %W(random)
## Azari et al. -0.3713 [-1.2803; 0.5377] 13.9
## Frasier et al. -1.0298 [-1.5922; -0.4674] 16.7
## Ebina et al. -0.7891 [-1.5016; -0.0767] 15.5
## Pastewski et al. 0.1911 [-0.4748; 0.8570] 15.9
## Davids et al. -0.0233 [-2.0633; 2.0167] 6.6
## Glarner et al. -0.7538 [-1.9316; 0.4240] 11.7
## Zhenzhu et al. 2.1002 [ 0.3361; 3.8643] 7.9
## Franco-González et al. 1.0900 [-0.0852; 2.2652] 11.7
##
## Number of studies combined: k = 8
##
## SMD 95%-CI t p-value
## Random effects model -0.1119 [-0.9315; 0.7077] -0.32 0.7563
##
## Quantifying heterogeneity:
## tau^2 = 0.5701 [0.1020; 4.2179]; tau = 0.7550 [0.3194; 2.0538]
## I^2 = 70.0% [37.6%; 85.6%]; H = 1.83 [1.27; 2.63]
##
## Test of heterogeneity:
## Q d.f. p-value
## 23.32 7 0.0015
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.toolacc,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Tool acceleration in Surgery")
#dev.print(pdf, "figures/forest_toolacceleration.pdf", width=8, height=8)
Not many papers that focused on tool accelerations. Jerk (third derivative of position, derivative of acceleration) is much more popular.
Jerk is the third derivative of the surgical instruments position, and measures how smooth the movements are.
Load data
df.jerk <- read_excel('data/surgical_metrics.xlsx', sheet='tool_jerk')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Ghasemloonia et al. | 2017 | Surgical Skill Assessment Using Motion Quality and Smoothness | Journal of Surgical Education | Results from task C included. Task had 4 groups of participants, results are from surgeons and residents. 9 trials per participant, 4 participants per group, so n=36 for both groups |
| Hwang et al. | 2006 | Correlating motor performance with surgical error in laparoscopic cholecystectomy | Surgical Endoscopy and Other Interventional Techniques | NA |
| Ebina et al. | 2021 | Motion analysis for better understanding of psychomotor skills in laparoscopy: objective assessment-based simulation training using animal organs | Surgical Endoscopy | Results from task 3, knot tying and suturing. Results given in paper as medians and inter-quartile ranges |
| Azari et al. | 2018 | Can surgical performance for varying experience be measured from hand motions? | Proceedings of the Human Factors and Ergonomics Society | Reported grand average results by skill group and by skill group and task. Results included here are the grand average by skill. Had 4 skill groups, picked medical students and attending surgeons. Paper did not report SDs for motion metrics, so I used the ratio of subjective evaluations mean and sd to estimate the sd. I.e. for novice’s the subjective motion fluidity score was mean=4.1, sd=1.9, so the SD for jerk was calculated as 178.34(1.9/4.1) (mean jerk (sd of fluidity score / mean of fluidity score) |
| Davids et al. | 2021 | Automated vision-based microsurgical skill analysis in neurosurgery using deep learning: Development and preclinical validation. | World Neurosurgery | Values given as medians |
| Oropesa et al. | 2013 | Relevance of Motion-Related Assessment Metrics in Laparoscopic Surgery | Surgical Innovation | Means and SDs estimated from boxplots. Reports dominant and non-dominant hand separately, I picked non-dominant hand. Results for Coordinated pulling task. |
| Maithel et al | 2005 | Simulated laparoscopy using a head-mounted display vs traditional video monitor: An assessment of performance and muscle fatigue | Surgical Endoscopy and Other Interventional Techniques | NA |
| Liang et al. | 2018 | Motion control skill assessment based on kinematic analysis of robotic end-effector movements | The International Journal of Medical Robotics and Computer Assisted Surgery | Estimated from boxplots. Reported left/right hand separately, here the results are for left hand |
| Islam et al. | 2016 | Affordable, web-based surgical skill training and evaluation tool | Journal of Biomedical Informatics | Mean values estimated from boxplot. Standard deviations were not given, I used the similar-ish values as in our study (i = 0), so novice’s SD is about 1/5 of the mean, experts is 1/12. Measured jerk with “jerkiness score” |
| Hofstad et al. | 2017 | Psychomotor skills assessment by motion analysis in minimally invasive surgery on an animal organ | Minimally Invasive Therapy and Allied Technologies | Values estimated from boxplot, used results for US hook |
| Shafiel et al. | 2017 | Motor Skill Evaluation During Robot-Assisted Surgery | Volume 5A: 41st Mechanisms and Robotics Conference | NA |
| Chmarra et al. | 2010 | Objective classification of residents based on their psychomotor laparoscopic skills | Surgical Endoscopy and Other Interventional Techniques | Values estimated from plots, used the pipe cleaner task results. |
| Mazomenos et al. | 2016 | Catheter manipulation analysis for objective performance and technical skills assessment in transcatheter aortic valve implantation | International Journal of Computer Assisted Radiology and Surgery | Task was performed with conventional tools and with robotic tools. Results are for conventional tools. There were 2 stages, results here are for stage 1. SDs evaluated from boxplots (Fig. 5). Expert jerk weirdly small? |
| Berges et al. | 2022 | Eye Tracking and Motion Data Predict Endoscopic Sinus Surgery Skill | Laryngoscope | Participants completed 9 tasks. Results are for smoothness. Text states that units for smoothness are 1/s^4, but Table 1 says that smoothness is the derivative of acceleration (jerk). Smoothness values are the first ones from Table 2. |
Run meta-analysis
m.jerk <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.jerk,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="Jerk in Surgery")
summary(m.jerk)
## Review: Jerk in Surgery
##
## SMD 95%-CI %W(random)
## Ghasemloonia et al. 1.7090 [ 1.1677; 2.2504] 8.3
## Hwang et al. 2.6183 [ 0.1709; 5.0658] 3.9
## Ebina et al. -0.9365 [-1.6598; -0.2133] 8.0
## Azari et al. -0.1972 [-0.8043; 0.4098] 8.2
## Davids et al. 0.1307 [-1.9101; 2.1714] 4.7
## Oropesa et al. -0.9775 [-2.1179; 0.1629] 6.9
## Maithel et al 1.6060 [ 0.7461; 2.4658] 7.6
## Liang et al. 0.1596 [-0.7184; 1.0377] 7.6
## Islam et al. 3.6094 [ 2.4907; 4.7280] 7.0
## Hofstad et al. 1.3201 [-0.1549; 2.7952] 6.0
## Shafiel et al. 0.4174 [ 0.2951; 0.5397] 8.8
## Chmarra et al. 0.8995 [-0.0026; 1.8015] 7.5
## Mazomenos et al. 1.0303 [-0.1862; 2.2468] 6.7
## Berges et al. 0.4687 [ 0.1586; 0.7788] 8.7
##
## Number of studies combined: k = 14
##
## SMD 95%-CI t p-value
## Random effects model 0.7731 [0.0558; 1.4904] 2.33 0.0367
##
## Quantifying heterogeneity:
## tau^2 = 1.2252 [0.5180; 3.8165]; tau = 1.1069 [0.7197; 1.9536]
## I^2 = 85.5% [77.2%; 90.8%]; H = 2.63 [2.10; 3.29]
##
## Test of heterogeneity:
## Q d.f. p-value
## 89.66 13 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.jerk,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Jerk in Surgery")
#dev.print(pdf, "figures/forest_tooljerk.pdf", width=8, height=8)
TBD
Tool force is the force the surgeon uses when they e.g. grasp something using the surgical tools.
Load data
df.force <- read_excel('data/surgical_metrics.xlsx', sheet='tool_force')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Harada et al. | 2015 | Assessing Microneurosurgical Skill with Medico-Engineering Technology | World Neurosurgery | Results for needle extraction phase (c), estimated from boxplot. Maximum needle gripping force |
| Prasad et al. | 2016 | Objective Assessment of Laparoscopic Force and Psychomotor Skills in a Novel Virtual Reality-Based Haptic Simulator | Journal of Surgical Education | Results estimated from boxplot. Whole group data (subplot a) reported here. |
| Horeman et al. | 2014 | Assessment of Laparoscopic Skills Based on Force and Motion Parameters | IEEE Transactions on Biomedical Engineering | Results estimated from boxplot, for task 2. Max force values used. |
| Trejos et al. | 2014 | Development of force-based metrics for skills assessment in minimally invasive surgery | Surgical Endoscopy | Used results for max grasp force, values evaluated from Fig. 4 (a). Compared experience level 1 and 6 |
| Woodrow et al. | 2007 | Training and evaluating spinal surgeons: The development of novel performance measures | Spine | Values estimated from Fig. 2. Values are mean forces. Compared results for lumbar level L2. |
| Sugiyama et al. | 2018 | Forces of Tool-Tissue Interaction to Assess Surgical Skill Level | JAMA Surgery | Evaluated values from Fig. 3 c. Standardizer, maximum force. |
| Araki et al. | 2017 | Comparison of the performance of experienced and novice surgeons: measurement of gripping force during laparoscopic surgery performed on pigs using forceps with pressure sensors | Surgical Endoscopy | The plot shows that novices grasped with force that is slightly over 8, but the text reports 7.15. Typo in text? SDs evaluated from boxplots. 4 novices and 4 experts, task completed twice. |
| Prasad et al. | 2018 | Face and Construct Validity of a Novel Virtual Reality–Based Bimanual Laparoscopic Force-Skills Trainer With Haptics Feedback | Surgical Innovation | Results are for the suturing task, non-dominant hand, mean needle force. Same dataset as in Prasad (2016)? |
| Amiel et al. | 2020 | Experienced surgeons versus novice surgery residents: Validating a novel knot tying simulator for vessel ligation | Surgery | 4 different knot types, each completed twice. 15 experts and 30 novices. Results are for the deep two hand knot (Fig. 2). Effects estimated from the plot, for Total Force. |
| Yoshida et al. | 2013 | Analysis of laparoscopic dissection skill by instrument tip force measurement | Surgical Endoscopy | Peak horizontal force results used. 10 novices and 10 experts, each performed the task 10 times |
| de Mathelin et al. | 2019 | Sensors for expert grip force profiling: Towards benchmarking manual control of a robotic device for surgical tool movements | Sensors | Results are for non-dominant hand, sensor 6 (S6), which was placed on the ring finger. Expert user was right-handed and novice left-handed. One expert, one novice participant. Expert results are for 12 sessions, novice results are for 10 sessions. |
| Pérez-Escamirosa | 2020 | Design of a Dynamic Force Measurement System for Training and Evaluation of Suture Surgical Skills | Journal of Medical Systems | Force is measured from the pad where the sutures are made. Mean force values from Table 2 |
Run meta-analysis
m.force <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.force,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="Force use in Surgery")
summary(m.force)
## Review: Force use in Surgery
##
## SMD 95%-CI %W(random)
## Harada et al. 0.5357 [-0.0830; 1.1545] 8.7
## Prasad et al. 1.2450 [ 0.6378; 1.8523] 8.8
## Horeman et al. 2.7082 [ 1.5555; 3.8609] 8.3
## Trejos et al. 1.5351 [ 0.2224; 2.8477] 8.2
## Woodrow et al. 3.8205 [ 2.2438; 5.3972] 7.9
## Sugiyama et al. 0.3071 [-0.8880; 1.5022] 8.3
## Araki et al. 1.4757 [ 0.3564; 2.5950] 8.4
## Prasad et al. -3.3271 [-4.1910; -2.4633] 8.6
## Amiel et al. 1.1005 [ 0.6332; 1.5678] 8.8
## Yoshida et al. 0.3002 [ 0.0214; 0.5789] 8.9
## de Mathelin et al. 7.0083 [ 4.6980; 9.3186] 6.9
## Pérez-Escamirosa 1.3790 [ 0.2709; 2.4872] 8.4
##
## Number of studies combined: k = 12
##
## SMD 95%-CI t p-value
## Random effects model 1.3922 [-0.0876; 2.8720] 2.07 0.0627
##
## Quantifying heterogeneity:
## tau^2 = 4.7225 [2.2326; 16.1295]; tau = 2.1731 [1.4942; 4.0162]
## I^2 = 93.0% [89.6%; 95.3%]; H = 3.78 [3.10; 4.61]
##
## Test of heterogeneity:
## Q d.f. p-value
## 157.44 11 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.force,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Force use in Surgery")
#dev.print(pdf, "figures/forest_toolforce.pdf", width=8, height=8)
Forces analyzed somewhat commonly, but often not between novices and experts, but within tasks, or tools, or skill groups.
Pupil size measures cognitive workload, stress, and million other things.
Load data
df.pupil <- read_excel('data/surgical_metrics.xlsx', sheet='pupil_dilation')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Castner et al. | 2020 | Pupil diameter differentiates expertise in dental radiography visual search | PLOS ONE | Reported values are medians? Median change from baseline |
| Cabrera-Mino et al. | 2019 | Task-Evoked Pupillary Responses in Nursing Simulation as an Indicator of Stress and Cognitive Load | Clinical Simulation in Nursing | There were different tasks, picked the one that had the most significant result. Values estimated from barplot |
| Bednarik et al. | 2018 | Pupil Size As an Indicator of Visual-motor Workload and Expertise in Microsurgical Training Tasks | Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications | Took the segment ‘needle push’, estimated from plots |
| Gunawardena et al. | 2019 | Assessing Surgeons’ Skill Level in Laparoscopic Cholecystectomy using Eye Metrics | Eye Tracking Research and Applications Symposium (ETRA) | Study had only 4 participants of 3 skill levels who completed >=7 tasks each. I picked the least experienced participant and expert E-2. |
| Dilley et al. | 2020 | Visual behaviour in Robotic Surgery—Demonstrating the validity of the simulated environment | International Journal of Medical Robotics and Computer Assisted Surgery | SDs calculated from inter-quartile ranges (SD = (3/4)*IQR). The paper reports medians. |
| Gao et al. | 2018 | Quantitative evaluations of the effects of noise on mental workloads based on pupil dilation during laparoscopic surgery | American Surgeon | They evaluated different noise conditions, I picked values from the no-noise condition. Paper does not give explicitly the number of participants in groups, only total number (24) which was “divided into experienced and moderately experienced”. I assumed 12 per group |
| Erridge et al. | 2018 | Comparison of gaze behaviour of trainee and experienced surgeons during laparoscopic gastric bypass | British Journal of Surgery | Results for Segment 1, maximum pupil size |
Run meta-analysis
m.pupil <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.pupil,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="Pupil dilation in Surgery")
summary(m.pupil)
## Review: Pupil dilation in Surgery
##
## SMD 95%-CI %W(random)
## Castner et al. 0.7877 [ 0.6671; 0.9083] 15.8
## Cabrera-Mino et al. 0.8255 [ 0.0502; 1.6009] 14.8
## Bednarik et al. -2.9791 [-3.5250; -2.4332] 15.3
## Gunawardena et al. 1.5927 [ 0.3701; 2.8152] 13.5
## Dilley et al. -0.0152 [-0.7136; 0.6833] 15.0
## Gao et al. 1.2184 [ 0.3422; 2.0946] 14.6
## Erridge et al. 0.2061 [-1.7697; 2.1820] 11.0
##
## Number of studies combined: k = 7
##
## SMD 95%-CI t p-value
## Random effects model 0.2042 [-1.2366; 1.6449] 0.35 0.7406
##
## Quantifying heterogeneity:
## tau^2 = 2.3095 [0.8388; 11.0728]; tau = 1.5197 [0.9158; 3.3276]
## I^2 = 96.7% [95.0%; 97.8%]; H = 5.51 [4.46; 6.81]
##
## Test of heterogeneity:
## Q d.f. p-value
## 182.24 6 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.pupil,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Pupil dilation in Surgery")
#dev.print(pdf, "figures/forest_pupil.pdf", width=8, height=8)
Prior research indicates that higher stress/cognitive workload -> larger pupil size. This is seen in most studies. In Bednarik et al. (2018), the effect is reversed. For that study, I picked needle piercing segment (because it was quaranteed to have un-interrupted visual contact from the participant). It can be that experts focused more on this, and had larger cognitive workload and pupil dilations.
Not that many studies that have measured pupil dilations and compared surgical novices and experts directly. Some used measures like ICA or Entropy (not included here). Pupil dilations used in other fields more often.
OSATS is a evaluation instrument that consists of a grading scale and a checklist.
Load data
df.osats <- read_excel('data/surgical_metrics.xlsx', sheet='scale_OSATS')
Print studies
| Author | Year | Study | Journal | Note |
|---|---|---|---|---|
| Nickel et al. | 2016 | Direct Observation versus Endoscopic Video Recording-Based Rating with the Objective Structured Assessment of Technical Skills for Training of Laparoscopic Cholecystectomy | European Surgical Research | OSATS score from Table 1, direct observation, novices and experts compared |
| Paley et al. | 2021 | Crowdsourced Assessment of Surgical Skill Proficiency in Cataract Surgery | Journal of Surgical Education | Used modified OSATS. SD estimated from Figure 1F. Used expert ratings. |
| Kassab et al. | 2011 | “Blowing up the barriers” in surgical training: Exploring and validating the concept of distributed simulation | Annals of Surgery | Study had two tasks, results are for DS (distributed simulation) because these results were given in the text (box trainer results only as figure). Note that DS was novel task developed for this study. |
| Black et al. | 2010 | Assessment of surgical competence at carotid endarterectomy under local anaesthesia in a simulated operating theatre | British Journal of Surgery | Results for crisis scenario |
| Willems et al. | 2009 | Assessing Endovascular Skills using the Simulator for Testing and Rating Endovascular Skills (STRESS) Machine | European Journal of Vascular and Endovascular Surgery | Combination of OSATS and some other score? May not be suitable for comparison here. Remove in the future. SDs estimated from Figure 2. |
| Leong et al. | 2008 | Validation of orthopaedic bench models for trauma surgery | Journal of Bone and Joint Surgery - Series B | Used results for DCP, dynamic comperssion plate. Esimtaed values from boxplot. |
| Hance et al. | 2005 | Objective assessment of technical skills in cardiac surgery | European Journal of Cardio-thoracic Surgery | Paper reported several tasks, live and blinded scoring. Values here are for LAD anastomosis, blinded scoring. |
| Zevin et al. | 2013 | Development, feasibility, validity, and reliability of a scale for objective assessment of operative performance in laparoscopic gastric bypass surgery | Journal of the American College of Surgeons | Results are for Jejunojejunostomy |
| Hopmans et al. | 2014 | Assessment of surgery residents’ operative skills in the operating theater using a modified Objective Structured Assessment of Technical Skills (OSATS): A prospective multicenter study | Surgery (United States) | Study included various tasks and techniques, results are for laparoscopic cholecystectomy. Novices are PGY1-2 and experts PGY5-6 |
Run meta-analysis
m.osats <- metagen(TE=g,
seTE=SDg,
studlab=Author,
data=df.osats,
sm="SMD",
fixed=FALSE,
random=TRUE,
method.tau="REML",
hakn=TRUE,
title="OSATS in Surgery")
summary(m.osats)
## Review: OSATS in Surgery
##
## SMD 95%-CI %W(random)
## Nickel et al. -1.7336 [-2.8266; -0.6406] 12.1
## Paley et al. -3.8215 [-5.6042; -2.0388] 8.5
## Kassab et al. -2.1749 [-3.2989; -1.0508] 11.9
## Black et al. -5.2357 [-7.1431; -3.3283] 7.9
## Willems et al. -3.7687 [-5.6853; -1.8520] 7.9
## Leong et al. -2.7078 [-4.2138; -1.2017] 9.8
## Hance et al. -1.0602 [-1.9016; -0.2188] 13.5
## Zevin et al. -1.8927 [-2.5553; -1.2301] 14.4
## Hopmans et al. -1.8633 [-2.5886; -1.1380] 14.1
##
## Number of studies combined: k = 9
##
## SMD 95%-CI t p-value
## Random effects model -2.4486 [-3.3937; -1.5034] -5.97 0.0003
##
## Quantifying heterogeneity:
## tau^2 = 0.9152 [0.1979; 5.8046]; tau = 0.9567 [0.4449; 2.4093]
## I^2 = 67.3% [34.1%; 83.8%]; H = 1.75 [1.23; 2.48]
##
## Test of heterogeneity:
## Q d.f. p-value
## 24.48 8 0.0019
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
Plot forest
forest.meta(m.osats,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="OSATS in Surgery")
#dev.print(pdf, "figures/forest_OSATS.pdf", width=8, height=8)
Not many papers that focused on tool accelerations. Jerk (third derivative of position, derivative of acceleration) is much more popular.